Cancer Research — Latest Matching Preprints

1

Sensitive Glioma Detection and Recurrence Monitoring Using a Machine Learning Model Based on Circulating Monocytes

Wu, W.; Chai, R.; Xia, P.; Wu, L.; Yu, B.; Chen, X.; Pang, B.; Chen, D.; Wang, Y.; Wang, N.; Li, X.; Liu, H.; Deng, Q.; Wan, F.; Lyu, F.; Wang, L.; Zhang, W.; Zhang, J.; Jiang, T.; Wang, Q.

2026-06-01 oncology 10.64898/2026.05.29.26354409 medRxiv

Top 3%

0.8%

Show abstract

Background: Non-invasive diagnosis, reliable recurrence surveillance remain critical unmet needs in gliomas. Glioma induces profound systemic immune alterations despite its anatomical confinement to the central nervous system. Circulating immune cells, particularly monocytes, are key mediators of tumor-host crosstalk and may retain tumor-induced transcriptional imprints. However, their potential clinical utility as blood-based biomarkers for detection and monitoring, remain largely unexplored. Methods and findings: In this study, we performed integrated single-cell RNA sequencing of blood immune cells and demonstrated that circulating CD14+ monocytes are significantly expanded in glioma patients, exhibiting features of differentiation arrest and increased transcriptional plasticity. These cells harbor glioma-specific molecular signatures distinct from those observed in healthy controls and patients with other tumors. Leveraging these findings, we developed an ensemble machine learning diagnostic model based on transcriptomic profiles of circulating CD14+ monocytes (training cohort, n=107), which achieved a mean area under the receiver operating characteristic curve (AUC) of 0.971 during cross-validation. In an independent cohort of 567 participants, the model maintained high diagnostic accuracy, yielding an AUC of 0.877 for distinguishing glioma from controls and other tumors. And it achieved a recurrence detection AUC of 0.969 in 51 postoperative samples. Moreover, in a prospective follow-up study involving 30 glioma patients, lower model-derived scores of postoperation were significantly associated with prolonged progression-free survival (log-rank test, P=0.043), supporting its prognostic utility. Conclusion: We demonstrate circulating CD14+ monocytes undergo glioma-specific transcriptional reprogramming, generating systemic tumor-associated signal captured via transcriptomic profiling. This blood-based diagnostic model provides non-invasive, scalable approach for glioma detection, recurrence surveillance, outcome prediction.

2

Tracking the Dynamic Trajectories: A Global-to-Local Pharmacovigilance Analysis of GLP-1 Receptor Agonists

Lu, S.; Ruan, X.; Wang, L.; Wang, X.; Sameer, M.; Liu, H.

2026-06-01 health informatics 10.64898/2026.05.28.26354401 medRxiv

Top 3%

0.7%

Show abstract

Although GLP1/GIP receptor agonists demonstrate unprecedented weight loss efficacy, their rapid clinical adoption has revealed significant real-world tolerability challenges. To evaluate their dynamic safety profiles, we developed a macro to micro pharmacovigilance framework by combining global FAERS reports with local UT Physician EHR. Macroscopically, we distilled 17 shared adverse events across the drug class from FAERS with disproportionality analysis. Microscopically, local EHR data (289,655 longitudinal treatment sessions across 71,316 patients) revealed 51.6% of GLP1 sessions terminated within 90 days. Furthermore, temporal stratified logistic regression demonstrated that initial exposure (0 to 30 days) correlated strongly with nausea and vomiting, which attenuated in extended sessions, whereas extended exposure (>2 years) uncovered late onset risks, notably incident hepatic steatosis. Ultimately, this time aware framework reveals that GLP1 safety profiles are profoundly duration dependent, providing critical insights into both acute intolerances and long-term medication safety.

3

Widespread Hyperalgesia Predicts Mortality in Pancreatic Adenocarcinoma

Faghih, M.; Damm, M.; Kassik, M.-T.; Cheesman, L.; Rauschenberg, S.; Olesen, S. S.; Laheru, D. A.; Zheng, L.; Phillips, A. E.; Yadav, D.; Drewes, A. M.; Rosendahl, J.; Singh, V. K.; International Pancreatic Pain Consortium,

2026-05-27 gastroenterology 10.64898/2026.05.19.26353594 medRxiv

Top 4%

0.7%

Show abstract

Pain in pancreatic ductal adenocarcinoma (PDAC) is associated with poor survival, but whether altered pain processing carries prognostic significance is unknown. We analyzed a prospective cohort of 143 patients with PDAC who underwent pancreatic quantitative sensory testing (PQST) after diagnosis. Patients were classified as having normal pain processing (n=84), segmental hyperalgesia (n=30), or widespread hyperalgesia (n=29). Survival was measured from the date of P-QST assessment. During follow-up, 70 deaths occurred. Widespread hyperalgesia was associated with increased mortality in unadjusted Cox analysis (HR 1.96, 95% CI 1.14,3.35) and after adjustment for age, sex, tumor stage, comorbidity, opioid treatment, and body mass index (adjusted HR 2.33, 95% CI 1.30,4.15). Segmental hyperalgesia was not associated with mortality. Kaplan Meier analysis demonstrated lower survival probability in the widespread hyperalgesia group (log rank p=0.025). These findings suggest that widespread hyperalgesia, reflecting altered central pain processing, identifies a subgroup of PDAC patients at increased risk of mortality independent of conventional clinical factors.

4

Phenome-Wide Association Study of Pre-Cancer Diagnosis Electronic Health Records Identifies Risk and Inverse Associations in the All of Us Research Program

Rich, C. C. D.; Bang, E. J.; Bair, A. B.; Richardson, B. E.; Millington, J. L.; Bates, B. A.; Davis, M. F.; Bailey, M. H.

2026-05-28 health informatics 10.64898/2026.05.26.26353823 medRxiv

Top 4%

0.5%

Show abstract

Background: The All of Us Research Program represents a rich resource for cancer epidemiology research, with over 400,000 participants with whole genome sequences linked to electronic health records (EHR). Large cancer datasets often focus exclusively on cases without controls and neglect pre-diagnosis healthcare occurrences. Here, we perform a phenome-wide association study (PheWAS) of EHR data at least 1 year pre-diagnosis between cancer cases and matched controls, revealing co-occurring and mutually exclusive phenotypes. Methods: We identified 55,000+ cancer cases across 21 cancer types in All of Us version 8. To eliminate age-related confounding, we implemented a two-stage matching and censoring strategy: loose matching on demographics to establish index dates and cohort comparability, followed by right-censoring of EHR data (excluding 1 year pre-diagnosis/index), then 1:2 matching to address residual demographic imbalance. We tested associations between 23,193 cancer cases, 46,386 matched controls and approximately 1,600 clinical phenotypes using logistic regression adjusted for sex at birth, self-reported race, age at diagnosis/index date, and two censored EHR metrics: observation window and unique condition count, with Bonferroni correction for multiple testing. Results: Our analysis identified 232 significantly associated phenotypes, confirming established cancer risk factors including elevated prostate specific antigen (OR = 2.92, 95% CI: 2.65-3.23; p-value=1.8x10-101) and multinodular goiter (OR = 1.73, 95% CI: 1.56-1.91; p-value=6.7x10-27). Further investigation into the relationship between several phenotypes with seeming inverse effects is warranted. Conclusions: This PheWAS of EHR data at least 1 year pre-diagnosis leveraged the diversity of All of Us to examine how clinical phenotypes prior to cancer diagnosis vary across cancer types and racial groups. Our findings validate All of Us as a robust platform for cancer epidemiology research, confirming established risk factors at scale across diverse populations. This work provides methodological insights for EHR-based susceptibility analyses and demonstrates the value of agnostic phenome-wide approaches for generating hypotheses in precision medicine.

5

Cell-Free DNA Genomic and Fragmentomic Features for Early Outcome Prediction in Large B-Cell Lymphoma.

Wang, S.; Mapar, P.; Moldovan, N.; van der Pol, Y.; Safrastyan, A.; van Werkhoven, E.; Tantyo, N. A.; Snieder, B.; Do Brito Valente, A. F.; de Jong, A. V.; Dinmohamed, A.; Drees, E. E. E.; Roemer, M. G. M.; Ylstra, B.; Klerk, C. P. W.; Strobbe, L.; Sandberg, Y.; Boersma, R. S.; Koene, H.; Pruijt, H.; de Heer, K.; van Rijn, R.; Bilgin, Y. M.; de Jongh, E.; Nijland, M.; van der Poel, M.; Koster, A.; Nieuwenhuizen, L.; Fijnheer, R.; Beeker, A.; Mous, R.; Vergote, V. K. J.; Vermaat, J. S. P.; Pegtel, D. M.; Chamuleau, M. E. D.; Mouliere, F.

2026-05-30 oncology 10.64898/2026.05.29.26353426 medRxiv

Top 4%

0.5%

Show abstract

Curative-intent immunochemotherapy fails in ~30% of patients with large B-cell lymphoma (LBCL), yet no validated molecular tool enables early identification of high-risk individuals to guide treatment intensification. Using shallow whole genome sequencing (sWGS) of plasma cell-free DNA from 190 LBCL patients, we developed and validated the ACT score (Aberrations, fragment Composition, Terminal motifs), a composite classifier integrating genomic and fragmentomic features from a single post-cycle-1 sample. ACT-positive patients had worse 2-year outcomes versus ACT-negative patients: time-to-progression 29% vs. 83% (HR 4.4, 95% CI 1.9 - 10.0; P = 1.5 x 10 - 4) and overall survival 47% vs. 93% (HR 8.7, 95% CI 3.0 - 25.4; P = 1.8 x 10-6). ACT score was independently prognostic of the International Prognostic Index, and their combination identified the highest-risk patients. Unlike mutation-based approaches, this assay requires neither tumor tissue, germline control nor a baseline plasma sample. Built on open-source tools and sWGS, the ACT score offers a feasible scalable strategy for early risk stratification in aggressive LBCL.

6

A priority index-based computational medicine framework (PimRNA) for prioritising personalised mRNA cancer vaccines

Fang, H.; Tan, T.

2026-05-29 oncology 10.64898/2026.05.26.26354114 medRxiv

Top 5%

0.4%

Show abstract

Background: The development of personalised mRNA cancer vaccines holds considerable promise for oncology, yet a significant translational gap persists between neoantigen identification and the selection of therapeutically impactful targets. Current approaches predominantly prioritise human leukocyte antigen (HLA) binding affinity and immunogenicity, often overlooking the systems-level biological context of the target. This can inadvertently favour immunogenic but biologically peripheral peptides that exert limited influence on tumour signalling networks, thereby constraining vaccine efficacy. Furthermore, mRNA therapeutics must satisfy additional design requirements, including favourable codon usage and favourable secondary-structure stability, which directly affect in vivo translation and half-life. A unified computational framework that integrates neoantigen discovery with network biology is therefore critically needed. Results: Here, we present PimRNA, a Priority index (Pi)-centric computational medicine framework that bridges this gap by unifying neoantigen identification, mRNA sequence optimisation, and gene interaction network analysis. First, high-confidence tumour-specific HLA class I and II neoantigenic peptides are identified from paired tumour-normal genomic and tumour transcriptomic data using NeoDisc. Second, the coding sequences of these peptides are optimised for stability and translational efficiency with LinearDesign, yielding a core set of neoantigen-encoding mRNAs. Third, a random walk with restart algorithm is applied to a knowledgebase of gene interactions to identify peripheral genes exhibiting significant network connectivity to core genes, generating a gene-predictor matrix in which each gene is assigned an affinity score reflecting its network proximity to immunogenic neoantigens. These scores are consolidated into a single, unified priority rating (0-5) for each gene, followed by subnetwork analysis that reveals therapeutically relevant gene modules. Application of PimRNA to breast cancer and melanoma datasets demonstrates that it successfully selects high-confidence immunogenic neoantigen candidates embedded within biologically meaningful tumour-specific networks. Conclusion: PimRNA provides a systems biology foundation for mRNA vaccine design, moving beyond isolated immunogenicity to prioritise targets that are both highly presented and central to tumour-relevant biological networks. This framework offers a generalisable strategy for the rational discovery and prioritisation of mRNA therapeutics, significantly advancing the field of computational medicine towards personalised cancer vaccines.

7

T cell transcriptional and receptor signatures predict response to telomerase vaccination in prostate cancer

Hoye, E.; Natkin, R.; Sajnani, K.; Engedal, N.; Simensen, J. E.; Hakkola, S.; Kiviaho, A.; Ballesio, F.; Cecchetto, T.; Ellingsen, E. B.; Westhrin, M.; Hovig, E.; Mathelier, A.; Visakorpi, T.; Tammela, T. L.; Murtola, T. J.; Eerola, S.; Nykter, M.; Lilleby, W.; Urbanucci, A.

2026-05-30 oncology 10.64898/2026.05.25.26354038 medRxiv

Top 5%

0.4%

Show abstract

While prostate cancer (PC) is defined as immunologically cold, limiting the efficacy of immune checkpoint inhibitors, therapeutic vaccination targeting tumor-associated antigens represents an attractive strategy to promote disease control in low volume metastatic patients. The UV1 cancer vaccine is based on immunization with tripeptide fragments from human telomerase reverse transcriptase (hTERT) and a phase II clinical trial demonstrated induction of robust T cell response in men with de novo metastatic castration-sensitive prostate cancer (mCSPC). Comparison with long-term survival data of non-metastatic CSPC patients as reference showed that despite metastatic disease at diagnosis, UV1-treated patients who mounted an early vaccine-induced immune response achieved progression-free and overall survival comparable to non-metastatic patients. We examined biological determinants of clinical benefit following UV1 vaccination including tumor transcriptome and T cell receptor (TCR) profiling from circulating and tissue resident T-cells of the 22 men enrolled. Analysis of diagnostic and post-UV1 treatment biopsies revealed that low baseline exhaustion of T cells and higher CD8+ T cell abundance are associated with early immune response to the vaccine and longer survival. Moreover, we identified specific TCR motifs relative to early responders, that can indicate potential benefit from UV1 vaccination. These findings indicate that baseline intratumoral T cell exhaustion state and repertoire shape responsiveness to hTERT vaccination and long-term outcome. Overall, our study underlines how baseline immune profiling may be used as a companion biomarker to predict mCSPC patients most likely to benefit from therapeutic vaccination.

8

Locally adaptive conformal prediction intervals for polygenic score-based phenotype prediction via residual normalization and data-driven stratification

Yun, Y.; Hao, X.; Zhang, Y. D.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.28.26354326 medRxiv

Top 5%

0.4%

Show abstract

Quantifying uncertainty in polygenic score (PGS)-based phenotype prediction is crucial for the integration of genomic data into precision medicine. While the PGS provides a fundamental pivot for point estimation, clinical decision-making necessitates the construction of well-calibrated prediction intervals that reliably encompass the true phenotypic values. However, phenotypic residuals are frequently characterized by complex heteroscedasticity and stratified variance structures across diverse demographic contexts. Existing approaches often rely on global calibration mechanisms, which fail to account for such localized variance structures and lead to systematic miscalibration within specific subpopulations. To bridge this gap, we propose Clustering-based Split Conformal Prediction with Normalized Residuals (C-SCNR), a versatile framework based on Split Conformal Prediction. By adopting residual normalization and incorporating a repetitive `split-and-cluster` mechanism, C-SCNR dynamically identifies latent error strata and applies fine-grained adjustments to the resulting intervals. Our framework requires no distributional assumptions regarding the phenotype, is compatible with any PGS method, and flexibly accommodates biologically-informed grouping. Simulation studies demonstrate that our framework consistently outperforms existing methods across diverse error distributions. In real-data applications analyzing Body mass index (BMI), Low-density lipoprotein (LDL) cholesterol, and High-density lipoprotein (HDL) cholesterol in the UK Biobank, C-SCNR effectively resolves the coverage deficiencies of existing methods in specific subgroups and consistently yields superior localized calibration. Overall, C-SCNR represents a flexible and powerful framework for constructing high-resolution context-specific prediction intervals, thereby facilitating more reliable clinical interpretations of polygenic risk.

9

Deep Learning Spatial Profiling of CD103+CD8+ T Cells and Survival in Rectal Cancer After Neoadjuvant Chemoradiotherapy

Abe, T.; Yamashita, K.; Nagasaka, T.; Fujita, M.; Ueda, Y.; Miyake, S.; Ito, R.; Adachi, Y.; Ando, M.; Tsuneki, T.; Okazoe, Y.; Konaka, R.; Takahashi, T.; Kagiyama, H.; Tachibana, T.; Imai, M.; Yoshida, T.; Saito, M.; Mukohyama, J.; Kanayama, K.; Koma, Y.-I.; Otowa, Y.; Hasegawa, H.; Ikeda, T.; Koterazawa, Y.; Aoki, T.; Harada, H.; Urakawa, N.; Goto, H.; Kanaji, S.; Yanagimoto, H.; Matsuda, T.; Takamura, S.; Yamashita, T.; Sasaki, R.; Fukumoto, T.; Kakeji, Y.

2026-05-28 oncology 10.64898/2026.05.26.26353629 medRxiv

Top 5%

0.4%

Show abstract

Background: CD8+ tumor-infiltrating lymphocytes (TILs) are established prognostic markers in colorectal cancer, yet the clinical significance of CD103+CD8+ tissue-resident memory-like (TRM-like) T cells in locally advanced rectal cancer (LARC) after neoadjuvant chemoradiotherapy (NACRT) remains unknown. Methods: We quantified CD8+ and CD103+CD8+ T-cell densities in stromal and intratumoral compartments of post-NACRT resection specimens from 40 LARC patients using Cu-Cyto, a deep learning-based imaging cytometry platform. Associations with survival, pathological response, and adjuvant chemotherapy (AC) were examined. Treatment-induced T-cell dynamics were assessed in paired pretreatment biopsies and post-NACRT resections (n = 9). Results: High stromal CD103+CD8+ density independently predicted better 5-year RFS (67.4% vs. 12.1%, p < 0.001) and OS (80.0% vs. 26.6%, p = 0.016); intratumoral density showed no prognostic significance. Pathological response correlated with stromal CD8+ but not CD103+CD8+ density. Paired analysis revealed a selective non-expansion of the CD103+ subset: stromal CD8+ T cells increased significantly after NACRT while CD103+CD8+ density remained unchanged. AC may preferentially benefit patients with low stromal CD103+CD8+ density. Conclusions: Stromal CD103+CD8+ T-cell density is a robust independent prognostic biomarker in rectal cancer after NACRT that appears to reflect pre-existing rather than treatment-induced immunity. Given its stability across NACRT, pretreatment biopsy assessment may provide equivalent prognostic information, with potential implications for patient stratification before treatment initiation.

10

Immune Checkpoint Response Profiles and Resistance Mechanisms in NSCLC Revealed by Circulating Extracellular Vesicle Proteomics

Taylor, C.; Davey, M.; Allain, E. P.; Cheema, A. S.; Crapoulet, N.; Finn, N.; Abd, M.; Ouellette, R.

2026-05-26 oncology 10.64898/2026.05.25.26354042 medRxiv

Top 5%

0.3%

Show abstract

Background: Immune-oncology has revolutionized cancer treatment, but some patients fail to benefit due to primary resistance and tumour-immune evasion. Extracellular vesicles (EVs) are secreted by both tumour and immune cells and mediate communication between cancer cells and the immune system. Our study used proteomic profiling of circulating EVs collected from NSCLC patients treated with immune checkpoint inhibitors (ICI) to identify predictive biomarkers of response as well as immune evasion mechanisms related to treatment resistance. Methods: EVs were isolated from plasma collected prior to ICI treatment using peptide-affinity purification and high-throughput proteomics was performed using Proximal Extension Assay. Differentially expressed EV proteins between durable (DR) and non-durable responders (NDR) were identified and evaluated using Cox proportional hazards regression, survival analysis, sex-stratified analysis, as well as pathway and network analysis. Results: Proteomics analysis identified 116 differentially expressed EV proteins between DR and NDR. NDR was characterized by enrichment of inflammatory, angiogenic, and immune-suppressive EV proteins, such as IL1RL1, TFRC, IL6ST, galectins, TNF superfamily death receptors, chemokines, and PCSK9. Pathway analysis revealed enrichment of angiogenesis, chemotaxis, ECM remodeling, and neutrophil degranulation associated with poor progression-free survival (PFS). In contrast, DR to ICI treatment was associated with EV proteins related to T- and B-cell activation and adaptive immunity. Sex-related differences in abundance and association with PFS was observed for certain EV proteins, including IL1RL1 and TFRC. A six protein EV model (IL1RL1, TFRC, ERI1, CCN5, IGFBPL1, and TNFRSF13C) demonstrated good prognostic performance for identifying NDR (AUC = 0.907) and stratified patients into three discrete risk groups. Conclusions: High-plex EV proteomics revealed biologically coherent tumour-immune signaling programs that are associated with ICI treatment resistance. Profiling circulating EVs may improve our understanding of EV-mediated immune evasion mechanisms and identify protein signatures that reflect the tumour immune microenvironment and predict response to immune checkpoint blockade.

11

Redefining Extent Of Resection After Meningioma Surgery: a Multicentre Observational Machine Learning Analysis Comparing Simpson, Radiological and Volumetric Grading

Pandit, A. S.; Deehan, M.; Moudgil-Joshi, J.; Reischer, G.; Mathew, S.; Pace, G.; Fatania, G.; Dalton, A.; Nair, R.; Hyare, H.; Mallon, D.; Kitchen, N.; Marcus, H. J.; Nachev, P.

2026-05-27 oncology 10.64898/2026.05.23.26353944 medRxiv

Top 6%

0.3%

Show abstract

Background: Extent of resection remains central to meningioma management, yet Simpson grading is subjective and may not reflect measurable postoperative residual disease. We compared surgeon-reported Simpson grade, report-derived radiological grading, and residual tumour volumetry across a multicentre cohort. Methods: We performed a retrospective study across two tertiary neurosciences centres comprising four hospitals, including patients undergoing primary cranial meningioma resection from 2006 to 2025. Postoperative magnetic resonance imaging (MRI) reports were harmonised using weakly supervised natural language processing based on term frequency-inverse document frequency (TF-IDF) and a linear support vector machine classifier. Residual tumour volume was segmented from contrast-enhanced postoperative MRI and log-transformed. Concordance between Simpson and radiological gross-total/subtotal resection classification was assessed using absolute agreement and prevalence-adjusted bias-adjusted kappa (PABAK). Cox models assessed recurrence-free survival, with bootstrap validation and anatomical and scan-timing sensitivity analyses. Results: Among 912 patients, recurrence or residual progression occurred in 281. Surgical-radiological agreement was substantial but imperfect (absolute agreement 74%; PABAK 0.61), with lower agreement in skull-base and parafalcine-parasagittal tumours. In adjusted models, recurrence hazard increased with Simpson grade (hazard ratio 1.54, 95% confidence interval 1.37-1.72), radiological grade (1.92, 1.68-2.20), and log-transformed residual volume (1.20, 1.16-1.24; all p<0.0005). Optimism corrected concordance increased from Simpson grade to radiological grade and log-volumetry (0.692, 0.733, and 0.748), with this ranking preserved across sensitivity analyses. Conclusions: Imaging-based postoperative residual disease measures outperformed Simpson grade. TF-IDF-assisted report-derived grading provides a scalable bridge to volumetry, while quantitative residual volume offers the strongest prognostic representation.

12

Using artificial intelligence for radiotherapy clinical trial quality assurance: analysis of a multi-institutional clinical trial for neurovascular-sparing prostate stereotactic ablative radiotherapy

Doucette, M.; Zhang, Y.; Liao, C.-Y.; Lin, M.-H.; Yan, Y.; Dess, R. T.; Tendulkar, R. D.; Garant, A.; Hannan, R.; Jiang, S.; Nguyen, D.; Desai, N.; Yang, D. X.

2026-05-29 health informatics 10.64898/2026.05.27.26354252 medRxiv

Top 6%

0.3%

Show abstract

Our study evaluated whether a deep learning auto segmentation model combined with machine learning triage can streamline radiotherapy clinical trial quality assurance (QA). We analyzed 107 stereotactic ablative radiotherapy (SABR) cases from a multi-institutional phase II clinical trial of neurovascular sparing prostate SABR, focusing on physician contours of the internal pudendal artery (IPA) as a novel organ-at-risk with substantial interobserver variability. Contours were scored by the trial principal investigator as Per-Protocol or Minor Deviation/Unacceptable. We applied a deep learning model for IPA auto-segmentation. Agreement between human and AI contours was then quantified using 14 overlap, distance, and surface metrics, and a supervised classifier was trained on these metrics to flag clinical trial protocol deviations. While AI segmentation achieved only modest geometric accuracy with mean Dice similarity coefficient of 0.446 and 95th percentile Hausdorff distance of 14.23, when incorporating all 14 metrics, a machine learning classifier yielded AUROC of 0.836, flagging all Minor Deviation/Unacceptable cases with 100% sensitivity on the 27 case hold-out set with 6 false positives and no false negatives. AI segmentation combined with metrics-based machine learning can triage protocol deviations within a multi-institution radiotherapy clinical trial, supporting prospective evaluation of AI-assisted trial QA.

13

The Impact of Non-coding G-quadruplex Variants on Human Traits and Disease Susceptibility

Sharma, R.; Hu, F.; Li, X.; Campos, R.; Kundu, K.; Atanur, S.; Karpinski, M.; Wasilewski, S.; MacArthur, S.; Vitsios, D.; Dhindsa, R. S.; Georgakopoulos-Soares, I.; Burren, O. S.; Petrovski, S.; Mustoe, A. M.; Wang, Q.; Glodzik, D.; Zou, X. Z.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.29.26354456 medRxiv

Top 7%

0.2%

Show abstract

Non-coding variants are important contributors to human traits and diseases but linking them to molecular mechanisms and phenotypes at scale remains challenging. G-quadruplexes (G4s) are four-stranded structures formed by guanine-rich sequences and have emerged as key functional elements within the non-coding genome. G4s are enriched in regulatory regions and can modulate gene expression at both the DNA and RNA levels, influencing transcription, replication, and RNA processing, positioning them as key mediators linking non-coding variation to complex biological traits. Here, we profile putative G4s across five regulatory regions in 459,449 UK Biobank genomes and perform phenome-wide association analyses spanning 2,941 plasma protein abundances, 13,321 binary traits, and 1,682 quantitative traits. We show that putative G4-modifying variants are depleted under purifying selection despite elevated local mutability and drive large, bidirectional associations with plasma proteins and clinical traits, including associations not captured by coding variants. Using a mechanism-aware collapsing strategy that groups rare non-coding variants by their predicted impact on G4 stability, we achieved stronger gene-level signals than those obtained with standard rare-variant collapsing approaches. Integrating non-coding and protein-truncating variants (PTVs) increases discovery power, revealing 843 significant associations missed by the PTV-only model. Replication in the Alliance for Genomic Discovery cohort demonstrates cross-cohort robustness. Our study suggests G4s as widespread mediators of non-coding regulation and provides a framework for mechanism-informed target discovery and prioritization across the non-coding genome.

14

Determinants of cancer care delays in Kinshasa, Democratic Republic of the Congo (DRC)

Dusingize, J. C.; Zotova, N.; Kabarriti, R.; Sehrawat, K.; Babakazo, P.; Alisho, A. S.; Kasindi, F. L.; Yessoufou, I.; Yotebieng, M.

2026-05-26 oncology 10.64898/2026.05.19.26353550 medRxiv

Top 7%

0.2%

Show abstract

PURPOSE: Cancer outcomes in sub-Saharan Africa are driven by delayed diagnosis and treatment initiation. We evaluated the magnitude and determinants of diagnostic and treatment delays among cancer patients in Kinshasa, Democratic Republic of the Congo (DRC). METHODS: We conducted a hospital-based cross-sectional study of 460 adults with confirmed cancer at Nganda Hospital Center in Kinshasa, DRC. Two outcomes were assessed: delay from symptom onset to diagnosis and delay from diagnosis to treatment initiation. Log-normal regression models were fitted for each outcome to estimate adjusted geometric mean ratios (aGMRs) and 95% confidence intervals (CIs). Covariates included demographic, socioeconomic, clinical, behavioral, and stigma-related factors. RESULTS: The median age was 55 years, and 76.2% of participants were women. Overall, 55.0% of participants experienced symptom-to-diagnosis delays >6 months, and 49.4% experienced diagnosis-to-treatment delays >3 months. Older age was associated with longer diagnostic delay (aGMR 1.55, 95% CI 1.03-2.31) and treatment delay (1.51, 1.07-2.14). Unemployment was strongly associated with both diagnostic delay (1.68, 1.15-2.47) and treatment delay (2.27, 1.54-3.33), as was hepatitis B co-infection (1.88, 1.06-3.34 and 2.42, 1.15-5.11, respectively). Longer diagnostic delay was additionally associated with informal trading (1.99, 1.21-3.28), taxi or motorbike transport (1.92, 1.25-2.94), and smoking history (2.25, 1.03-4.91), while high cancer-stereotype stigma was associated with longer treatment delay (1.56, 1.04-2.34). CONCLUSION: Substantial delays exist across the DRC cancer care continuum, driven by socioeconomic vulnerability, transport barriers, hepatitis B co-infection, and cancer-related stigma. These findings highlight the need for integrated interventions to improve timely diagnosis and treatment initiation, including strengthening financial protection, decentralizing cancer services, and reducing stigma in cancer care.

15

A TAD-informed aging-brain xQTL atlas of multi-modal and cell-type-resolved regulatory variation

Cifello, J.; Feng, R.; Grenn, F. P.; Carter, L.; Liu, A.; Sun, H.; Li, R.; Empawi, J. A.; Greenfest-Allen, E.; Katanic, Z.; Valladares, O.; Kuzma, A. B.; White, H.; Farrer, L. A.; Goate, A. M.; Raj, T.; Wang, M.; Cruchaga, C.; Wang, L.-S.; Klein, H.; De Jager, P. L.; Chen, H.; Marcora, E.; TCW, J.; Zhang, X.; Kuksa, P. P.; Wang, G.; Leung, Y. Y.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.21.26353713 medRxiv

Top 7%

0.2%

Show abstract

Understanding the regulatory consequences of genetic variation in the aging human brain requires molecular maps that span brain regions, cell types and regulatory modalities. We present the Alzheimer's Disease Sequencing Project Functional Genomics (FunGen-AD) xQTL Atlas, a harmonized resource of molecular quantitative trait loci from four postmortem brain studies, ROSMAP, MSBB, Knight-ADRC and MiGA. The atlas integrates histone acetylation, DNA methylation, gene expression, splicing and protein abundance QTLs across 14 brain regions, 7 major cell types and 17,566 samples, with standardized association, significance-filtered and fine-mapping outputs. To expand discovery beyond conventional 1-Mb cis windows, we include variants within Topologically Associating Domains (TAD) and their boundaries where appropriate, identifying on average 21% more variant-molecular-trait associations per dataset. Statistical fine-mapping reduced broad association sets by 95% into credible sets of candidate regulatory variants. Distributed through the NIAGADS xQTL portal and bulk-download services, the atlas provides a comprehensive functional-genomic foundation for interpreting genetic risk variants in Alzheimer's disease and aging-brain research.

16

Connecting Baseline Immune Exhaustion in Hot Tumors to Oral Cancer Recurrence and Nodal Metastasis

Shaikh, S.; Basu, S.; Hajihosseini, M.; Nandy, S. K.; Moorthy, M.; Arun, I.; Lali, B. S.; Arun, P.; Mukherjee, G.; Pyne, S.

2026-05-30 oncology 10.64898/2026.05.27.26354295 medRxiv

Top 8%

0.2%

Show abstract

Background: The use of immune checkpoint inhibitors (ICIs) in the treatment of cancer has rapidly expanded over the last decade. However, there are several knowledge gaps in understanding how tumor cells evade the immune system. There is paucity of data in HPV negative oral cancer, particularly of the gingivobuccal region. Understanding the mechanism of immune system evasion in this cancer is vital for improving patient outcomes. Methods: We characterized the baseline immune milieu of oral cancer using immunohistochemistry (IHC) on whole tumor sections from 124 cases. Tumors were classified as hot or cold and further stratified into high-risk and low-risk groups. High-risk patients included those with lymph node metastasis at diagnosis/recurrence or distant metastasis within 2 years of treatment completion. Patients without these features were categorized as low risk. Validation by RNA-Seq and Joint Enrichment Analysis of Oncogenic and Immunologic Pathways was carried out in a subset of 46 cases. Results: Hot high-risk tumors (by IHC) were distinguished by elevated PD-L1 expression and reduced NK-cell, PD1, and CTLA-4 expression. There was no difference in the expression levels of CD3+, CD8+, granzyme, or perforin compared to hot low-risk tumors, findings that align with the definition of hot tumors. RNA-Seq revealed a gene signature associated with exhausted T-cells in hot high-risk tumors. Gene and pathway analyses identified differential upregulation of isoform-specific TOX, TCF, CXCR, RUNX, IRF, BRD and BCL6 genes, implicating immune cell exhaustion and tumor aggressiveness. Significantly downregulated genes included PDCD1, HAVCR2, ZAP70, and STAT, indicative of a disabled immune microenvironment. These findings support that a state of immune exhaustion in HHR tumors is driven by progenitor exhausted T-cells and terminally exhausted T-cells; independent of PD1-TIM3. Conclusion: These findings suggest that combining TOX/TCF/BCL6 inhibitors with immune checkpoint inhibitors in the adjuvant setting might benefit patients with hot high-risk tumors. Given the results, testing for a targeted exhaustion-related gene panel at diagnosis is recommended for oral cancers to stratify tumors as high-risk or low-risk. Larger validation studies and clinical trials are now warranted.

17

Survival and neurologic outcomes after re-irradiation in children with diffuse midline glioma and diffuse intrinsic pontine glioma

Vaziri, T.; Vyas, D.; Alhumaid, M.; Lucas, C.-H.; Guryildirim, M.; Kilburn, L.; Gartrell, R. D.; Koldobskiy, M. A.; Raabe, E.; Cohen, K.; Ladra, M.; Acharya, S.

2026-06-01 oncology 10.64898/2026.05.29.26354429 medRxiv

Top 8%

0.2%

Show abstract

Background: Reirradiation (reRT) is increasingly offered following progression in diffuse intrinsic pontine glioma (DIPG) and diffuse midline glioma (DMG), though optimal patient selection remains a challenge. This study evaluated clinical outcomes after reRT in a contemporary cohort of patients with DIPG/DMG. Methods: Patients <26 years old with DMG/DIPG treated with radiation therapy between 2011-2025 were retrospectively reviewed. Primary endpoints included overall survival (OS2) and progression-free survival (PFS2), measured from first progression, and change in neurologic symptoms after reRT. Survival was estimated using Kaplan Meier methods, with Cox proportional hazards modeling for prognostic factors. Results: Fifty eight patients were included; 37 (63.8%) underwent reRT. Tumors were predominantly pontine (74.1%). ReRT was associated with improvement in motor function (51.4% vs. 9.5%, p=0.002), cranial nerve function (29.7% vs. 4.8%, p=0.044), and gait ataxia (35.1% vs. 9.5%, p=0.059). Median OS2 and PFS2 were improved with reRT (OS2: 9.67 vs. 2.57 months, p<0.001; PFS2: 5.63 vs. 1.57 months, p<0.001). OS2 was independently associated with reRT (HR 0.27, p<0.0001), pontine location (HR 2.94, p=0.004), and steroid use at progression (HR 4.12, p=0.001). PFS2 was independently associated with reRT (HR 0.23, p < .0001) and distant pattern of failure (HR 2.83, p=.037). Among reRT patients, non-pontine location was associated with improved OS2 (p=0.02), and local failure was associated with improved PFS2 (p=0.003). Conclusion: ReRT was associated with neurologic improvement and prolonged survival. Patients with non-pontine tumors or local-only failure might derive the greatest benefit. Prospective studies are warranted to define optimal dose/fractionation and refine patient selection.

18

Antibiotic Timing and Survival After Immune Checkpoint Inhibitor Initiation in Patients With Cancer

Zhang, K.; John, D.; Li, W. T.; Hogarth, M.; McKay, R. R.; Ongkeko, W. M.

2026-05-28 oncology 10.64898/2026.05.27.26354193 medRxiv

Top 9%

0.2%

Show abstract

Importance: While gut dysbiosis is known to impair response to immune checkpoint inhibitors (ICIs), the relative clinical impact of antibiotic timing (pre- vs. post-ICI initiation) remains unclear. Objective: To evaluate whether antibiotic timing differentially influences overall survival (OS) in a large, multi-institutional pan-cancer cohort. Design, Setting, and Participants: This retrospective cohort study utilized deidentified electronic health record data from six academic medical centers within the University of California Health system. We included 21,108 adults with any malignancy who received PD-1, PD-L1, or CTLA-4 inhibitors between January 2014 and December 2024. Exposures: Antibiotic exposure windows were categorized as pre-only (-60 to -1 days), post-only (+1 to +60 days), both windows, or none. Main Outcomes and Measures: The primary outcome was overall survival (OS) calculated from the first ICI dose. Multivariable Cox proportional hazards models adjusted for demographics, tumor type, line of therapy, and baseline health indicators (albumin, NLR, and recent hospitalization). Results: Among 21,108 patients, 17.3% had pre-only exposure, 13.3% had post-only exposure, and 60.6% had no exposure. In the multivariable model, post-only exposure (HR, 1.27; 95% CI, 1.20-1.35) and combined pre- and post- exposure (HR, 1.31; 95% CI, 1.23-1.40) were significantly associated with higher mortality. Pre-only exposure was not significantly associated with OS (HR, 1.04; 95% CI, 0.99-1.10). Subgroup analyses by tumor type showed consistent trends across major malignancies, including head and neck (Post HR, 1.46) and renal cell carcinoma (Post HR, 1.26). Conclusions and Relevance: In contrast to some smaller studies, this large-scale analysis indicates that antibiotic exposure after ICI initiation carries a greater risk than exposure prior to treatment. These findings highlight the need for rigorous antibiotic stewardship strategies specifically during the early phases of immunotherapy treatment.

19

DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv

Top 9%

0.1%

Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

20

Cancer Prevalence and Patterns in Kilifi County: A 10-year Retrospective Descriptive Study

Masha, M.; Mbugua, R. W.; Abdullahi, M.; Sheikh, N. A.; Omar, A.; Abdihamid, O.

2026-06-01 oncology 10.64898/2026.05.20.26353643 medRxiv

Top 9%

0.1%

Show abstract

Abstract Background Cancer is an increasing public health challenge in Kenya, particularly in rural and underserved regions where surveillance systems and diagnostic capacity remain limited. Kilifi County, located along the Kenyan coast, lacks a population-based cancer registry, and data on the local cancer burden is not available. This study aimed to characterize the demographic distribution of patients, cancer burden in the county, and management of cancer cases diagnosed at Kilifi County Referral Hospital (KCRH) over ten years. Methods This retrospective study analyzed the patterns of cancer in Kilifi County using patient records from KCRH during the study period (January 1, 2014, to January 1, 2024). Results A total of 101 patients with cancer were identified, 58% female, with a mean age of 54 years. Most patients were from Kilifi North (47%), with a high proportion reporting no formal occupation (41%) or farming (26%). Esophageal and cervical cancers were the most common (18% each), followed by breast and prostate cancers (5% each), with other malignancies occurring infrequently. Histopathology was the primary diagnostic modality (88%). Staging data were incomplete in 70% of cases; among documented cases, the majority presented with advanced disease (21% stage IV). Due to limited local treatment capacity, approximately half of the patients were referred to tertiary centers for chemotherapy, radiotherapy, or surgery. At data cut-off, 43% had died, 25% were on treatment, and 29% were lost to follow-up, with only 2% completing treatment or under follow-up. Conclusions This study demonstrates a substantial cancer burden in Kilifi County and highlights critical gaps in diagnostic capacity, staging, and continuity of care. Strengthening cancer surveillance systems, expanding diagnostic and treatment infrastructure, and establishing a population-based cancer registry are essential to improving cancer outcomes and advancing equitable care in rural Kenya